Goto

Collaborating Authors

 saliency mask




Review for NeurIPS paper: Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Neural Information Processing Systems

Additional Feedback: ****** post rebutall ****** I do apologize for weighing the relevance issue too much. My major concern is that this paper would only meet the novelty bar if this problem is very relevant, as the major contributions are tightly bounded with the IBSR task. I also defend my comments on the use terms. 'Saliency' in the R.fig1 is a bad example for justifying the word saliency. It is ambiguous in a way that it could be the sofa in the back or the chair in the front.


Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework

arXiv.org Artificial Intelligence

Extract-then-Abstract is a naturally coherent paradigm to conduct abstractive summarization with the help of salient information identified by the extractive model. Previous works that adopt this paradigm train the extractor and abstractor separately and introduce extra parameters to highlight the extracted salients to the abstractor, which results in error accumulation and additional training costs. In this paper, we first introduce a parameter-free highlight method into the encoder-decoder framework: replacing the encoder attention mask with a saliency mask in the cross-attention module to force the decoder to focus only on salient parts of the input. A preliminary analysis compares different highlight methods, demonstrating the effectiveness of our saliency mask. We further propose the novel extract-and-abstract paradigm, ExtAbs, which jointly and seamlessly performs Extractive and Abstractive summarization tasks within single encoder-decoder model to reduce error accumulation. In ExtAbs, the vanilla encoder is augmented to extract salients, and the vanilla decoder is modified with the proposed saliency mask to generate summaries. Built upon BART and PEGASUS, experiments on three datasets show that ExtAbs can achieve superior performance than baselines on the extractive task and performs comparable, or even better than the vanilla models on the abstractive task.


How we remove the background from product images at BestPrice.gr

#artificialintelligence

Product images commonly convey information about the product to the consumers without much processing effort from the consumer's end. Our users do not have the option to physically interact with the product so there is the obvious need for visual descriptors, whose primary job is to provide accurate information about the product. Appealing images take it a step further, they create positive first impressions for products to attract the interest of potential customers. If someone is looking to buy a laptop for example, they may scroll through hundreds of different laptops, stopping for nothing more than mere seconds before moving on. This is common for e-commerce users and one thing that stands out when this behavior is exhibited is that it can be a very lousy experience sometimes if certain criteria are not met.


Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

arXiv.org Machine Learning

Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from Mitrovic et al., 2021, we propose ReLICv2 which combines an explicit invariance loss with a contrastive objective over a varied set of appropriately constructed data views. ReLICv2 achieves 77.1% top-1 classification accuracy on ImageNet using linear evaluation with a ResNet50 architecture and 80.6% with larger ResNet models, outperforming previous state-of-the-art self-supervised approaches by a wide margin. Most notably, ReLICv2 is the first representation learning method to consistently outperform the supervised baseline in a like-for-like comparison using a range of standard ResNet architectures. Finally we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.


Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models

arXiv.org Machine Learning

Since collecting pixel-level groundtruth data is expensive, unsupervised visual understanding problems are currently an active research topic. In particular, several recent methods based on generative models have achieved promising results for object segmentation and saliency detection. However, since generative models are known to be unstable and sensitive to hyperparameters, the training of these methods can be challenging and time-consuming. In this work, we introduce an alternative, much simpler way to exploit generative models for unsupervised object segmentation. First, we explore the latent space of the BigBiGAN -- the state-of-the-art unsupervised GAN, which parameters are publicly available. We demonstrate that object saliency masks for GAN-produced images can be obtained automatically with BigBiGAN. These masks then are used to train a discriminative segmentation model. Being very simple and easy-to-reproduce, our approach provides competitive performance on common benchmarks in the unsupervised scenario.


Learn to Interpret Atari Agents

arXiv.org Machine Learning

Deep Reinforcement Learning (DeepRL) agents surpass human-level performances in a multitude of tasks. However, the direct mapping from states to actions makes it hard to interpret the rationale behind the decision making of agents. In contrast to previous a-posteriori methods of visualizing DeepRL policies, we propose an end-to-end trainable frameworkbased on Rainbow, a representative Deep Q-Network (DQN) agent. Our method automatically learns important regions in the input domain, which enables characterizations of the decision makingand interpretations for non-intuitive behaviors. Hence we name it Region Sensitive Rainbow (RS-Rainbow). RS-Rainbow utilizes a simple yet effective mechanism to incorporate visualization ability into the learning model, not only improving model interpretability, but leading to improved performance. Extensive experiments on the challenging platform of Atari 2600 demonstrate thesuperiority of RS-Rainbow. In particular, our agent achieves state of the art at just 25% of the training frames. Demonstrations and code are available at https://github.com/yz93/Learn-to-


Real Time Image Saliency for Black Box Classifiers

Neural Information Processing Systems

In this work we develop a fast saliency detection method that can be applied to any differentiable image classifier. We train a masking model to manipulate the scores of the classifier by masking salient parts of the input image. Our model generalises well to unseen images and requires a single forward pass to perform saliency detection, therefore suitable for use in real-time systems. We test our approach on CIFAR-10 and ImageNet datasets and show that the produced saliency maps are easily interpretable, sharp, and free of artifacts. We suggest a new metric for saliency and test our method on the ImageNet object localisation task. We achieve results outperforming other weakly supervised methods.


Real Time Image Saliency for Black Box Classifiers

arXiv.org Machine Learning

University of Cambridge and Alan Turing Institute, London In this work we develop a fast saliency detection method that can be applied to any differentiable image classifier. We train a masking model to manipulate the scores of the classifier by masking salient parts of the input image. Our model generalises well to unseen images and requires a single forward pass to perform saliency detection, therefore suitable for use in real-time systems. We test our approach on CIFAR-10 and ImageNet datasets and show that the produced saliency maps are easily interpretable, sharp, and free of artifacts. We suggest a new metric for saliency and test our method on the ImageNet object localisation task. We achieve results outperforming other weakly supervised methods.